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Abstract 

We propose a new ensemble for Monte Carlo simulations, in which each state is 
assigned a statistical weight 1/k, where k is the number of states with smaller or equal 
energy. This ensemble has robust ergodicity properties and gives significant weight to 
the ground state, making it effective for hard optimization problems. It can be used 
to find free energies at all temperatures and picks up aspects of critical behaviour 
(if present) without any parameter tuning. We test it on the travelling salesperson 
problem, the Edwards- Anderson spin glass and the triangular antiferromagnet. 
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The method of Monte Carlo simulation has proved very useful for studying the ther- 
modynamic properties of model systems with moderately many degrees of freedom. The 
idea is to sample the system's phase space stochastically, using a computer to generate a 
series of random configurations. We take the phase space to consist of N discrete states 
(with label i), though the method applies equally to continuous systems. Often only a 
tiny fraction of the phase space (the part at low energy) is relevant to the properties being 
studied, due to the strong variation of the Boltzmann weight exp(— (3Ej) in the canonical 
ensemble (CE). It is then helpful to sample in an ensemble (with relative weights Wi and 
absolute probabilities Pi = Wi/ J2j w j) which is concentrated on this region of phase space. 
The Metropolis algorithm |L| samples directly in the CE, and is good at determing many 
physical properties (with the notable exception of the free energy). The price to be paid 
for this is that successive configurations are not independent (typically they have a single 
microscopic difference), but instead form a Markov chain with some equilibration time 

teq{Wi). 

We may distinguish two important characteristics of a Monte Carlo simulation: its 
ergodicity (measured by t eq (wi)) and its pertinence (measured by iV s (iy f ;Z), the average 
number of independent samples needed to obtain the information X that we seek). We 
should choose Wi so as to minimize the total number of configurations that need to be 
generated, which is proportional to t eq (wi)N s (wi;X). It is easy to specify an ensemble 
which would yield the sought information if independent samples could be drawn from 
it, but an ensemble with too much weight at low energies may become fragmented into 
"pools" at the bottoms of "valleys" of the energy function, and so have a large equilibration 
time. For example, it is well known that at low temperatures the Metropolis algorithm can 
get stuck in ordered or glassy phases. Ergodicity may be improved by sampling instead 
in a non-physical ensemble with a broad energy distribution, which allows the valleys to 
be connected by paths passing through higher energies @, |3], ||. A weight assignment 
leading to such a distribution cannot in general be written as an explicit function of energy 
alone; rather it is an algorithm's purpose to find this assignment, which then tells us 
about the density of states p(E). This reversal (starting with the distribution and finding 
the weights) of the usual Monte Carlo process can be achieved using a series of normal 
simulations, adjusting the weight W{ after each run so that the resulting energy distribution 
p Wi (E) converges to the desired one. Although one might need more samples from such 
a broad energy ensemble (BEE) than from a particular CE in order to find properties 
relating to that CE, it is possible for a single BEE simulation to provide information on 
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properties over a range of temperatures. BEEs are also helpful for finding free energies 
(since relative normalizations can only be determined for overlapping distributions [l], §]) 
and for sampling across regions of negative heat capacity in the vicinity of first-order phase 
transitions 0. 

The energy distribution used by a BEE algorithm is a free parameter ||, and is often 
taken to be uniform (this was called the "multicanonical ensemble" (MCE) by Berg f2j). 
It would, however, be natural to look for an optimal most general distribution, ie one with 
the best worst-case performance in terms of ergodicity and pertinence. We will consider 
only monotonically decreasing weight assignments Wi, implemented using the Metropolis 
scheme of accepting a transition i — > j with probability min (wj/wi, 1). Our proposal is to 
use the ensemble with weight 

Wi = —, (1) 

where ki is the number of states with energies up to and including Ei. This ensemble 
has the property that \og(N)N s independent samples from it convey as much information, 
concerning any property, as iV s independent samples from any rival ensemble || (the fac- 
tor logiV, which is a measure of the ensemble's worst-case pertinence, is smaller for this 
ensemble than for any other). In particular, of order logiV independent samples from this 
ensemble are sufficient both to find the ground state and to determine the normalization 
of the density of states. While the best worst-case ergodicity is probably obtained by 
sampling at infinite temperature, this is useless in terms of pertinence. We expect rea- 
sonable ergodicity for the 1/k ensemble since if we require a rival ensemble to assign an 
equal probability to some state, then its transition rates from this state may exceed those 
in the 1/k ensemble by a factor of at most logiV. In contrast, the equilibration time for 
uniform energy sampling may be made arbitrarily large by choosing a suitably unreason- 
ably reparametrized hamiltonian H' = f(H), where f(H) is a monotonically increasing 
function (the 1/k ensemble is invariant under such operations). 

The 1/k ensemble is equivalent to uniform entropy sampling (ie puk(E) oc dS/dE 
I 77 /.';; since for practical purposes the entropy S is given by S(Ei) ~ log ki. Like the CE, 
it has a sensible thermodynamic limit in that the relative weight of states with a single 
microscopic difference remains of order unity as M — > oo, where M is the system size. 
However, whereas in the CE fluctuations in intensive quantities such as energy density 
typically go to zero like M~i, in the 1/k ensemble they are independent of M, with the 
result that the 1/k ensemble is non-self-averaging even for simple systems such as the 
ferromagnetic Ising model. For example, if the physical system has a second order phase 
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transition at some temperature T c , this will be reflected by a power law contribution to 
the spin-spin correlation function in the 1/k ensemble |10|, with a new exponent: 



G 1/k (r) - G 1/k (oo) ~ r -V~**i)Hi-«)/» . ( 2 ) 

In spite of this, the correlation function (determined by spatial average) for a state drawn 
from the 1/k ensemble is likely to be exponentially decaying, with a random correlation 
legnth. To obtain (|2|), we first note that uniform sampling of the entropy leads to smooth 
(ie at least once differentiable) sampling of the energy, at least in systems or regimes where 
the heat capacity and temperature are strictly positive, since 

d Pi/k(E) 1 

dE a THJ' {6) 

The 1/k ensemble may be expressed as a linear combination of canonical ensembles (in the 
thermodynamic limit): 

Pl/k -> fpcE(T(E))p 1/k (E)dE 

= Jp CE (T)p 1/k (T)dT (4) 

where E is the normalized energy and p represents any probability assigned in an ensemble, 
since relative fluctuations in the CE go to zero in the thermodynamic limit. Close to the 
critical energy (letting t = (T — T c )/T) we find 

Pi/k(t)~t- a (5) 

where a is the critical exponent describing the divergence of the heat capacity. Under a 
real-space renormalization with scale factor b, pi/ k {t) is carried by the flow (t — > fe 1 ^, 
where v is the exponent describing the divergence of the correlation length) away from the 
fixed point, and so reduces by a factor &~( 1 ~ a! )/ ly . Thus there is a contribution to Gy k (r) 
which, as in the canonical critical ensemble, scales under RG transformations, though 
with an extra factor of b^^^ a ^ u . This reflection of critical properties (which normally 
require parameter tuning) in the 1/k ensemble shows that it in some sense exhibits (by 
means of non-trivial probability distributions) possible behaviours of the system over all 
temperatures. 

In principle, 1/k sampling may be implemented by an algorithm whose only parameters 
|TT| are the number of Monte Carlo steps to use at each stage of the convergence process 
(which should be enough for equilibration to have occurred, and might be determined by 
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the algorithm). Specifically, we may represent the nth approximation p n (E) to the density 
of states as a set of delta-functions and use the recurrence 

E -L6(E-Ek) 
p n+ \E) = samples 1 j (6) 

samples i 

with 

+1 _ f CT^E)- 1 if^>^ min m 

* "l^)- 1 if^<^min l j 

where a n (E) = /jf^ p n (E) dE is the integrated density of states and a n (E) is an extrapola- 
tion of a n (E) below the lowest sampled energy -E min . Note that when sampling a continuous 
space, one should use u>™ +1 = (a n (E) + (Tog-got)" 1 in order to make the sampled entropy 
range finite. <y n (E) may be evaluated in of order log Ng steps, where Ng is the number 
of delta-functions used to represent p n (memory constraints may limit Ng so that some 
grouping procedure is needed for the delta- functions). We have also found it useful to rep- 
resent p n (E) as a histogram and to compute and store the bias function W{ before each run. 
One requires only that the histogram is fine enough to resolve variation in p(E). Uniform 
energy sampling, in contrast, needs a specific choice of histogram, which must be coarse 
enough to have good statistics. In 1/k sampling, equation (|7]) automatically interpolates 
as finely as permitted by the data, short of curve fitting. However curve fitting is helpful 
in determining a n (E), since with each run the range of energies being sampled increases to 
cover energies where the predicted cr(E) used in (0) is not wrong by a large factor. The first 
run may use w® = const., which is likely to lead to progressively increasing equilibration 
times in the following runs as the sampled energy range extends further down. 

The improved ergodicity of BEEs makes them attractive for use in hard optimization 
problems M . While their applicability may be similar to that of simulated annealing (see 
eg IP- 2(1); their behaviour differs in that they offer "open-ended" improvement, since they 
never commit to a particular valley, but continue to search for better solutions. They also 
dispense with the need for a cooling schedule, which is a crucial parameter for simulated 
annealing algorithms. Although a "cautious" BEE algorithm may spend most of its time 
visiting highly non-optimal configurations, this could be offset by using parallel computa- 
tion, such as one might anticipate being readily available in the future (equilibration time, 
on the other hand, is a basic constraint on an algorithm's performance). 

Our first test of 1/k sampling is a 100-city travelling salesperson problem (see eg JT3fl), 
with moves consisting in segment transport or reversal, following ||14j| . Fig. 1 shows a 45 (E), 



5 



4 6 

= L/L opt 



Figure 1: Integrated density of states for the archived travelling salesperson problem 
"kroAlOO" fli~8 , Normalization is with respect to the established optimal tour length 



L opt , as listed in the archive. The dashed line shows p m - m = 2/99!. Inset: the optimal tour. 

where each run was continued until 2 x 10 6 transitions had been accepted, and (|7|) was 
used with the trivial extrapolation a n (E < E^ in ) = a n (E£ lin )/2. 

An additional run was conducted starting with a randomized configuration but using 
the previously obtained density of states; Fig. 2 shows the length of the best-so-far tour as 
a function of the number of cost evaluations (which we consider to be more relevant than 
computer time since it is more characteristic of an algorithm and since some optimization 
problems, eg protein folding (which has been studied using the MCE |[5|), may involve 
expensive cost calculations). This should be regarded as an upper limit for the perfor- 
mance of 1/k sampling in that not all of the 45 iterations of (](]) could be eliminated by 
extrapolating the density of states, and as a lower limit in that no parallelism was used. 

If we know N, then the absolute value of p^ in provides a useful measure of progress 
during global optimization, since Np^ in serves as an estimate for the number of states at or 
below the lowest energy sampled (assuming ergodicity). In this way, using runs of 16 x 10 6 
accepted transitions on the problem instance shown in Fig. 1, we obtained a ground state 
entropy of 0.15 ± 0.15 bits, with a variance of 0.6 bits. 

In order to compare 1/k sampling with the multicanonical ensemble, we performed sim- 
ulations of the Edwards- Anderson model with Ising spins Sj = ±1 and nearest-neighbour 
interactions = ±1 (with EJ^ = 0), on a 12 x 12 square lattice with periodic boundary 
conditions. Fig. 3 shows the energy visitation densities H(E) and the calculated entropy, 
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Figure 2: Length of the best-so-far tour as a function of the number of cost evaluations 
for a particular run. Among 10 such runs, the number of cost evaluations required to 
find the optimal tour varied between ~ 2 x 10 6 and ~ 64 x 10 6 . The plateaus are due to 
excursions back up to non-optimal configurations. 

s(E) = log 2 + log(cr(-E))/12 2 , for one realization. For 9 realizations we computed the 
ergodicity times in sweeps (MC steps per spin), following 0. We found Ty k to vary be- 
tween 1199 and 19512, with median 2025, while Ty k /Ty LCE was more sharply peaked, at 
0.69 ± 0.04. The ground state entropies were s(E ) = 0.080 ± 0.019 nats per spin. 

The last application reported here is a simulation of a regular system with frustra- 
tion, the triangular antiferro magnet, on a 48 x 48 parallelogram with periodic boundary 
conditions. Using 5 runs of 7.4 x 10 5 sweeps, we obtained a ground state entropy of 



0.32320, with a variance of 0.00015, which is consistent with the exact bulk value p6| of 
(2/tt) Jo /3 log(2 cos uj) duj ~ 0.32307. 

These simulations show that 1/k sampling has significant advantages over existing 
techniques. For the travelling salesperson problem it found the global optimum, its only 
parameter being the number of iterations to use. Lee and Choi [17] have obtained good 
results for large scale travelling salesperson problems using a "multicanonical annealing" 
algorithm which is based on the MCE, but constrained to a certain energy range which is 
then "annealed." While this approach is less greedy than simulated annealing, we believe 
that ergodic algorithms will have a higher probability of finding the global optimum in 
the limit of many samples or of much parallelization. 1/k sampling may, however, benefit 
from being truncated above some fixed energy, provided this isn't so low as to compromise 
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Figure 3: Results for simulations using 6.4 x 10 6 sweeps on one realization of the 12 x 12 
Edwards-Anderson spin glass, (a) Histogram H(E) of the energy visitation density in the 
1/k ensemble and in our implementation of the MCE following ||. (b) The entropy per 
spin for the same system. 
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ergodicity. The results for the spin glass show that 1/k sampling has faster equilibration 
and more weight for low-lying states than the MCE, though it would be worthwhile to 
continue this comparison to larger systems. It would also be interesting to compare the 
variance of the ground state entropy results for the triangular antiferromagnet with that 
obtained by other methods. 

1/k sampling may also be useful for determining the functional form of a density of 
states, since it is completely impartial on account of its reparametrization invariance. Un- 
fortunately the equilibration times of BEE algorithms are rather long, going as M 2 in the 
best case and as more than M 3 for the Edwards- Anderson spin glass . While BEE algo- 
rithms may be unnecessarily cautious for studying well-behaved systems when free energies 
are not required, the slower equilibration for the spin glass probably reflects the intrinsic 
difficulty of this problem. We suggest that 1/k sampling may be especially useful for ob- 
taining complete and reliable information on the properties of relatively small systems, 
since it has, among a large class of ensembles, the most general applicability in terms of 
the number of independent samples needed, combined with robust ergodicity properties 
and a minimum requirement for input from the user. 
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